40 research outputs found

    Expanded Parts Model for Semantic Description of Humans in Still Images

    Get PDF
    We introduce an Expanded Parts Model (EPM) for recognizing human attributes (e.g. young, short hair, wearing suit) and actions (e.g. running, jumping) in still images. An EPM is a collection of part templates which are learnt discriminatively to explain specific scale-space regions in the images (in human centric coordinates). This is in contrast to current models which consist of a relatively few (i.e. a mixture of) 'average' templates. EPM uses only a subset of the parts to score an image and scores the image sparsely in space, i.e. it ignores redundant and random background in an image. To learn our model, we propose an algorithm which automatically mines parts and learns corresponding discriminative templates together with their respective locations from a large number of candidate parts. We validate our method on three recent challenging datasets of human attributes and actions. We obtain convincing qualitative and state-of-the-art quantitative results on the three datasets.Comment: Accepted for publication in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI

    Hybrid multi-layer Deep CNN/Aggregator feature for image classification

    Full text link
    Deep Convolutional Neural Networks (DCNN) have established a remarkable performance benchmark in the field of image classification, displacing classical approaches based on hand-tailored aggregations of local descriptors. Yet DCNNs impose high computational burdens both at training and at testing time, and training them requires collecting and annotating large amounts of training data. Supervised adaptation methods have been proposed in the literature that partially re-learn a transferred DCNN structure from a new target dataset. Yet these require expensive bounding-box annotations and are still computationally expensive to learn. In this paper, we address these shortcomings of DCNN adaptation schemes by proposing a hybrid approach that combines conventional, unsupervised aggregators such as Bag-of-Words (BoW), with the DCNN pipeline by treating the output of intermediate layers as densely extracted local descriptors. We test a variant of our approach that uses only intermediate DCNN layers on the standard PASCAL VOC 2007 dataset and show performance significantly higher than the standard BoW model and comparable to Fisher vector aggregation but with a feature that is 150 times smaller. A second variant of our approach that includes the fully connected DCNN layers significantly outperforms Fisher vector schemes and performs comparably to DCNN approaches adapted to Pascal VOC 2007, yet at only a small fraction of the training and testing cost.Comment: Accepted in ICASSP 2015 conference, 5 pages including reference, 4 figures and 2 table

    From Images to Shape Models for Object Detection

    Get PDF
    We present an object class detection approach which fully integrates the complementary strengths offered by shape matchers. Like an object detector, it can learn class models directly from images, and can localize novel instances in the presence of intra-class variations, clutter, and scale changes. Like a shape matcher, it finds the boundaries of objects, rather than just their bounding-boxes. This is achieved by a novel technique for learning a shape model of an object class given images of example instances. Furthermore, we also integrate Hough-style voting with a non-rigid point matching algorithm to localize the model in cluttered images. As demonstrated by an extensive evaluation, our method can localize object boundaries accurately and does not need segmented examples for training (only bounding-boxes

    Unifying discriminative visual codebook generation with classifier training for object category recognition

    Get PDF
    The idea of representing images using a bag of visual words is currently popular in object category recognition. Since this representation is typically constructed using unsupervised clustering, the resulting visual words may not capture the desired information. Recent work has explored the construction of discriminative visual codebooks that explicitly consider object category information. However, since the codebook generation process is still disconnected from that of classifier training, the set of resulting visual words, while individually discriminative, may not be those best suited for the classifier. This paper proposes a novel optimization framework that unifies codebook generation with classifier training. In our approach, each image feature is encoded by a sequence of “visual bits ” optimize

    Detecting Overfitting of Deep Generative Networks via Latent Recovery

    Full text link
    State of the art deep generative networks are capable of producing images with such incredible realism that they can be suspected of memorizing training images. It is why it is not uncommon to include visualizations of training set nearest neighbors, to suggest generated images are not simply memorized. We demonstrate this is not sufficient and motivates the need to study memorization/overfitting of deep generators with more scrutiny. This paper addresses this question by i) showing how simple losses are highly effective at reconstructing images for deep generators ii) analyzing the statistics of reconstruction errors when reconstructing training and validation images, which is the standard way to analyze overfitting in machine learning. Using this methodology, this paper shows that overfitting is not detectable in the pure GAN models proposed in the literature, in contrast with those using hybrid adversarial losses, which are amongst the most widely applied generative methods. The paper also shows that standard GAN evaluation metrics fail to capture memorization for some deep generators. Finally, the paper also shows how off-the-shelf GAN generators can be successfully applied to face inpainting and face super-resolution using the proposed reconstruction method, without hybrid adversarial losses

    Empowering the trustworthiness of ML-based critical systems through engineering activities

    Full text link
    This paper reviews the entire engineering process of trustworthy Machine Learning (ML) algorithms designed to equip critical systems with advanced analytics and decision functions. We start from the fundamental principles of ML and describe the core elements conditioning its trust, particularly through its design: namely domain specification, data engineering, design of the ML algorithms, their implementation, evaluation and deployment. The latter components are organized in an unique framework for the design of trusted ML systems.Comment: This work has been supported by the French government under the "France 2030" program, as part of the SystemX Technological Research Institute Research Institut

    From Images to Shape Models for Object Detection

    Get PDF
    This research was supported by the EADS foundation, INRIA, CNRS, and SNSF. V. Ferrari was funded by a fellowship of the EADS foundation and by SNSF.International audienceWe present an object class detection approach which fully integrates the complementary strengths offered by shape matchers. Like an object detector, it can learn class models directly from images, and can localize novel instances in the presence of intra-class variations, clutter, and scale changes. Like a shape matcher, it finds the boundaries of objects, rather than just their bounding-boxes. This is achieved by a novel technique for learning a shape model of an object class given images of example instances. Furthermore, we also integrate Hough-style voting with a non-rigid point matching algorithm to localize the model in cluttered images. As demonstrated by an extensive evaluation, our method can localize object boundaries accurately and does not need segmented examples for training (only bounding-boxes)

    Solution of the Simultaneous Pose and Correspondence Problem Using Gaussian Error Model

    No full text
    International audienceThe use of hypothesis verification is recurrent in the model-based recognition literature. Verification consists in measuring how many model features transformed by a pose coincide with some image features. When data involved in the computation of the pose are noisy, the pose is inaccurate and difficult to verify, especially when the objects are partially occluded. To address this problem, the noise in image features is modeled by a Gaussian distribution. A probabilistic framework allows the evaluation of the probability of a matching, knowing that the pose belongs to a rectangular volume of the pose space. It involves quadratic programming, if the transformation is affine. This matching probability is used in an algorithm computing the best pose. It consists in a recursive multiresolution exploration of the pose space, discarding outliers in the match data while the search is progressing. Numerous experimental results are described. They consist of 2D and 3D recognition experiments using the proposed algorithm

    Model-based object tracking in cluttered scenes with occlusions

    Get PDF
    In this paper, we propose an e cient method for tracking 3D modelled objects in cluttered scenes. Rather than tracking objects in the image, our approach relies on the object recognition aspect of tracking. Candidate matches between image and model features de ne volumes in the space of transformations. The volumes of the pose space satisfying the maximum number of correspondences are those that best align the model with the image. Object motion de nes a trajectory in the pose space. We givesome results showing that the presented method allows to track objects even when they are totally occluded for a short while, without supposing any motion model and with a low computational cost (below 200 ms per frame on a basic workstation). Furthermore, this algorithm can also be used to initialize the tracking.
    corecore